-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Replace CentroidQueryScorer with CentroidIterator #131824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
|
while I agree on the rationale, the |
|
@tteofili you can use #topScore() to get the current score, why would you want to store those scores on a list? |
|
@tteofili I had a look into your patch and you can add the logic when computing the iterator and add a method to the iterator API called something like getRecommendedNprobe or something of those sorts. |
yes, thx, I can do something along those lines although it might sound a bit weird for such an iterator to have those optimization responsibilities. not 100% sure though |
|
Still your optimization considers that we will always score all the centroids, if we ever add a parent layer to the centroids, then your optimization will not work. I think we should not assume we have all the scores for the centroids. |
|
I'll open an issue as soon as I have more confidence that doing anything like that is useful to speedup search (e.g., for multi-segment), and we can discuss that there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, other than an inconsistency between code and comment.
| // Here we assume `lx` is simply bit vectors, so the scaling isn't necessary | ||
| float lx = (targetCorrections[1] - ax) * FOUR_BIT_SCALE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've noticed this while going through the code with @john-wagster , the comment says scaling isn't necessary but then we do scaling, either the comment is outdated or we have a bug (seen it in other parts of the code).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy paster error in the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with this. No concerns. I think it's a good clean up. My only thought is the centroid 1bit quantization / rescoring work is going well and will probably be beneficial and is definitely going to conflict heavily with this change. And that work will likely benefit more so from this clean up. So may be worth considering waiting till early next week when that work hopefully goes in?
|
I spoke to @wags and we agree that this is a good step forward, so merging. |
I think the CentroidQueryScorer leaks a lot of implementation details on our search strategy. we only want an object that feeds the algorithm with the centroids to be searched so I think it make sense to change the CentroidQueryScorer interface with a CentroidIterator interface. In this case, the details of how we build the iterator is hidden for the search strategy.